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ABSTRACT 

The pre-translational modification of messenger 
ribonucleic acids (mRNAs) by alternative promoter 
usage and alternative splicing is an important 
source of pleiotropy. Despite intensive efforts, our 
understanding of the functional implications of this 
dynamically created diversity is still incomplete. 
Using the available knowledge of interaction 
modules, particularly within intrinsically disordered 
regions (IDRs), we analysed the occurrences of 
protein modules within alternative exons. We find 
that regions removed or included by 
pre-translational variation are enriched in linear 
motifs suggesting that the removal or inclusion of 
exons containing these interaction modules is an 
important regulatory mechanism. In particular, we 
observe that PDZ-, PTB-, SH2- and WW-domain 
binding motifs are more likely to occur within alter- 
native exons. We also determine that regions 
removed or included by alternative promoter usage 
are enriched in IDRs suggesting that protein isoform 
diversity is tightly coupled to the modulation of 
IDRs. This study, therefore, demonstrates that 
short linear motifs are key components for estab- 
lishing protein diversity between splice variants. 



INTRODUCTION 

The major pre-translational mechanisms for expanding 
the repertoire of gene function are alternative splicing, 
known to occur in at least 86% of human genes (1) and 
alternative promoter usage, known to occur in 30-50% of 
human genes (2), with other mechanisms such as ribo- 
nucleic acid (RNA) editing (3) also contributing to the 
diversification of the human proteome. Alternative gene 
products increase the signalling and regulatory complexity 
of the proteome in both temporal- and tissue-specific 
manner (4). This observed proteomic complexity enabled 
by the one- to many relationship between most genes and 



their protein products raises the question of how these 
isoforms confer distinct functionality. 

A potential explanation for this functional diversity at 
the protein level is modulation of domain-domain inter- 
actions (5); however, proteome-wide studies indicate that 
the removal of a globular domain is a relatively rare event 
(6,7). Instead, studies have shown that intrinsically dis- 
ordered regions (IDRs) are preferentially found within 
alternative exons (8,9). This enrichment for IDRs does 
not explain the functional diversity found between many 
alternative protein products of a gene. For example, 
alternative splicing is known to determine the binding 
properties, stability, subcellular localisation and post- 
translational modifications (PTMs) of a large number of 
proteins (4,10). As short linear-motif (SLiM) interaction 
modules are enriched within IDRs (11,12), we 
hypothesised that the removal or addition of SLiM- 
containing exons could confer distinct functions to a 
splice variant, as these interaction modules are associated 
with a diverse array of cellular processes (11,13). These 
include promoting transport [e.g. the nuclear localisation 
signal (NLS)], directing cleavage (e.g. caspase-3 scission 
sites), acting as sites for PTMs (e.g. phosphorylation 
sites), mediating ligand binding (e.g. the PxxP SH3- 
binding motif) and marking proteins for degradation 
(e.g. KEN-box motif) (14). 

SLiMs (~3— 10 amino acids in length) are typically 
associated with low-affinity interactions [generally in the 
1-150 uM range (14)], predisposing them to reversible and 
transient associations (11). Although the context of a 
motif is important for binding (15), the majority of the 
binding affinity and specificity arises from a limited 
number of amino acids, 2-5 on average (11). This 
ensures that only a few stochastic mutations are required 
to convergently evolve a functional motif. For example, in 
neuronal cells, a single mutation of an innocuously 
exposed TQG sequence can create a TQT dynein-binding 
motif resulting in the synaptic transport of the protein 
(16). In this manner, motifs can arise by convergent evo- 
lution (11), with stochastic mutations more likely to occur 
in regions with high substitution rates, such as IDRs (17) 
and alternative exons (18). The presence of motifs within 
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alternative exons can, in turn, create novel functionality 
for splice variants. For example, the inclusion of an alter- 
native exon 3 amino acids in length creates a 
dynein-binding motif in a splice variant of myosin Va, 
enabling splice variant-specific cargo recognition (19). 
The presence of SLiMs in alternative exons can also 
create splice variants with novel cellular localisations, as 
occurs in human 8-oxoguanine deoxyribonucleic acid 
(DNA) glycosylase when an alternative exon containing 
an NLS targeting motif is removed, leading to the exclu- 
sion of the splice variant from the nucleus (20). 

In this article, we investigate whether the removal or 
addition of exons containing SLiMs is a common regula- 
tory mechanism used by the cell. We analyse the experi- 
mentally validated SLiM instances annotated in the 
Eukaryotic Linear Motif (ELM) (13) and Domino (21) 
resources, along with other functional units (globular 
domains, phosphorylation sites, transmembrane regions 
and signal peptides), for their presence in sequences 
altered between known protein isoforms (AltSeqs). We 
demonstrate that SLiMs are enriched within AltSeqs 
and confirm that partial domains are under-represented 
within AltSeqs (6). We also demonstrate that exons 
excluded or included by alternative promoter usage are 
enriched with IDRs demonstrating that these unstructured 
regions of proteins are a recurring property of non- 
constitutive exons. 



MATERIALS AND METHODS 

Data sets 

SLiM instances are extracted from the ELM resource 
(version 08/2011) (13), a database of manually annotated 
experimentally verified SLiMs. These SLiMs are divided 
by ELM into different functional classes (160 in total), 
each describing a unique molecular function. Each ELM 
class is described by a regular expression defined using 
experimentally validated SLiM instances. These 1595 in- 
stances represent a gold standard for SLiM annotation 
and were collected independently of whether they were 
present in alternative exons. 

An additional data set is also derived from the Domino 
peptide interaction database (version 10/2009) (21) to 
validate the results produced using data from the ELM 
resource. Domino annotates high-quality experimental 
data on globular domain-peptide interactions independ- 
ently of ELM and, therefore, can be used as a cross- 
validation data set. A total of 848 protein isoforms 
produced from 274 genes are extracted from the 
Domino resource with peptides shorter than 30 amino 
acids. A minimal length of 30 amino acids is chosen, as 
this is shorter than all known linear-motif interaction 
domains (shortest WW domain) (22). Five linear motifs 
classes, whose interactions have been analysed in greater 
detail by high-throughput (HTP) studies and/or curated 
by experimental annotation databases are investigated in 
detail. These linear-motif instances bind to PDZ (23), PTB 
(24), SH2 (13,25), SH3 (26) and WW (13,27) domains 
and together create a dataset of 408 motif instances 
within 302 genes. 



As additional annotation, for each canonical protein 
sequence with a known motif instance, globular domains 
are extracted from Pfam v25 (28), phosphorylation sites 
from the low-throughput annotation of Phospho.ELM 
(03/2011) (29) and functional elements (transmembrane 
domains and signal peptides) from UniProt annotation. 
These features are mapped onto the canonical protein 
sequence as defined by UniProt. 

Isoform data are retrieved from UniProt (05/201 1) (30), 
a manually annotated, non-redundant protein sequence 
database. This resource curates annotated protein splice 
variants of genes only if there is experimental evidence 
that it exists or has at least one messenger RNA 
(mRNA) with correct intron/exon boundaries. It, there- 
fore, represents a high-quality resource of validated 
protein isoforms. The analyses use the canonical isoform 
as chosen by UniProt. All protein products of a gene are 
extracted from the UniProt resource for protein sequences 
with at least one ELM-annotated SLiM instance and more 
than one annotated UniProt protein product. 

All data sets are filtered for proteins of high similarity 
using UniRef90 (31) to limit bias introduced by homolo- 
gous proteins with greater than 90% sequence identity. 

Methods 

The enrichment of functional units (SLiMs, phosphoryl- 
ation sites, globular domains and functional elements 
[transmembrane domains and signal peptides]) within alter- 
native sequences (AltSeqs) is assessed based on an approach 
outlined by Kriventseva et al. (6). This method aims to 
evaluate whether there is a preference for certain functional 
units to be altered between protein isoforms. This approach 
compares the expected number of instances — calculated 
with the assumption that there are no biases in the data 
set towards certain functional units being altered — with 
the observed number of instances altered between protein 
isoforms. AltSeqs are used in this approach, as they are 
continuous sections present in canonical protein sequences, 
as prescribed by UniProt, but missing in another protein 
isoform. AltSeqs, therefore, reflect the consequences of 
transcript changes at the protein level, for example, an 
AltSeq may represent two alternative exons that are 
always removed together, ensuring a whole globular 
domain is never only partially present. 

The calculation of the expected number of occurrences 
(e) of functional units within AltSeqs uses a sliding 
window method. This approach requires that AltSeqs 
are randomly distributed within protein sequences. To 
test this assumption, the distribution of annotated 
UniProt AltSeqs is assessed. This analysis found no 
strong positional bias for AltSeqs [Supplementary Figure 
SI and Kriventseva et al (6)]. A sliding window approach 
is, therefore, used to calculate the expected number of 
occurrences of functional units. For each AltSeq, a 
window of equal length to the AltSeq (Window) scans 
the AltSeq-containing protein progressing one amino 
acid at a time, counting the functional units (FUNC A s) 
overlapping (partial hits) or within (full hits) the window. 
The expected occurrences of partial/full domains, 
phosphorylation sites, transmembrane regions, signal 
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peptides, PTMs and linear motifs are calculated using the 
following equation: 



FUNC 



AS 



Window 



x AltSeqsCount 



(1) 



where FUNCas = number of instances of a functional 
unit, j, in sliding windows; Window = number of sliding 
windows; AltSeqsCount = number of AltSeqs. A 
goodness of fit / 2 test is then used to compare the 
expected and observed proportions. 

The expected number of occurrence of functional units 
within regions of intrinsic disorder [(e? IS )] is also assessed. 
A protein sequence is assessed for disorder using the 
IUPred algorithm (32), with the assumption that amino 
acids with IUPred scores over 0.4 are disordered (11,12). 
The following equation is used: 



,DIS 



FUNC™ 



Wmdow DIS 



x AltSeqsCount 



DIS 



(2) 



FUNC™ = number of functional units both in a dis- 
ordered region and an AltSeqs; Window DIS = sliding 
window count only including windows with an average 
IUPred score of over 0.4; AltSeqsCount DIS = number 
of AltSeqs in disordered regions (average IUPred 
score > 0.4). A goodness of fit / 2 test is then used to 
compare the expected and observed proportions. 

The assessment of the individual ELM classes for en- 
richment within AltSeqs by x 2 test is infeasible due to the 
limited number of instances in each class. An adaptation 
of the log-odds ratio calculation [LOG-odds domain 
(LOD) (5)] was, therefore, used to compare individual 
ELM classes with the observed occurrence of linear- 
motif removal, taking into account the number of in- 
stances in each ELM class: 



the observed intrinsic disorder of the exons altered by 
alternative promoter usage or alternative splicing. 



RESULTS 

Alternative promoter exons are enriched in IDRs 

AltSeqs produced by alternative splicing are enriched for 
IDRs (8); however, no investigation of protein-encoding 
AltSeqs specifically produced by alternative promoter 
usage has been undertaken. A comparison is therefore 
undertaken comparing the proportion of IDRs within 
the average human exon with the proportion of IDRs 
within exons removed or included by alternative 
promoter usage. We extract from the UniProt database 
(30), a non-redundant set of 188 altered splice variants 
derived from 124 genes produced solely by alternative 
promoter usage. The analysis of this data using the 
IUPred algorithm (scores > 0.4 considered disordered) 
identifies an enrichment of IDRs within exons altered by 
alternative promoter usage (X 2 P value: 0.033) (59 
observed and 38 expected) (Figure 1). In addition to 
this, a significant under-representation of ordered 
regions is noted within those AltSeqs altered by alternative 
promoter usage (x 2 P value: 1.32 e~ 3 ) (61 observed and 
102 expected). The exons removed or included by alterna- 
tive splicing are also enriched for IDRs (x 2 P value: 2.20 
e~ 16 ) as previously shown (8). 

SLiMs are enriched in AltSeqs 

The enrichment of IDRs within AltSeqs raises the 
question of whether known functional regions within 
IDRs occur at a higher or lower rate than expected 
within sequences altered between protein isoforms. The 



LOD 



(3) 



IC = instance count, P yy = observed probability of an 
instance in ELM class being in AltSeq; P yx = observed 
probability of an instance in ELM class not being in 
AltSeq; P xx = observed probability of an ELM instance 
in AltSeq, P xy = Observed probability of an ELM 
instance not being in AltSeq. 

Counts of recurring SLiMs or SLiM instances that have 
at least one other instance of the same ELM class in the 
same protein is calculated from the ELM resource's 
annotated data. A recent survey of ELM identified that 
34.9% of ELM-annotated instances were recurring (11). 
A goodness of fit x 2 test is used to assess whether the 
observed occurrences of recurring motifs within AltSeqs 
is present at a higher rate than expected (34.9%). 

Structural disorder is assessed using the IDR predictor 
IUPred (32) with exons having an average score of greater 
than 0.4 considered as unstructured (11,12). The expected 
proportion of intrinsic disorder within an exon is 
calculated based on an analysis of the protein-coding 
exons annotated by EnsEMBL (33) found within the ca- 
nonical UniProt human proteins. A goodness of fit x 2 test 
compares the intrinsic disorder of the average exon with 
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□ All Exons 
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□ Alternative Promoter Usage 



m4d 

20-40% 40-60% 60-80% 

Intrinsic disorder content 




Figure 1. A comparison of intrinsic disorder between exons. The pro- 
portion of exons predicted as intrinsically disordered, defined as 
residues that the IUPred algorithm predicted with a score above 0.4. 
Exons altered by alternative splicing and exons altered by alternative 
promoter usage were analysed for IDRs as compared with the average 
human exon. Error bars represent 90.0% error rate. 
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Figure 2. The distribution of functional units within AltSeqs. The observed and expected counts of AltSeqs disrupting an entire or partial SLiM, a 
phosphorylation site, an entire or partial globular domain, an entire or partial functional element (transmembrane domain or signal peptide) or no 
functional units from the annotated data set. The number of elements in each class is shown and their percentage in brackets. (A) The observed 
distribution of functional units within AltSeqs when compared with the expected distribution. (B) The observed distribution of functional units 
within AltSeqs when compared with the expected distribution within IDRs (regions with IUPred scores > 0.4). Both partial and entire transmem- 
brane domains and signal peptides (functional elements) are combined, as their observed occurrences were very low. 



initial analysis of functional site enrichment within 
AltSeqs is undertaken using a data set of 1421 protein 
isoforms produced from 404 genes, which is limited to 
those genes with a protein isoform containing an 
annotated and experimentally validated SLiM instance 
from the ELM resource. As shown in Figure 2a, the pro- 
portion of AltSeqs (average length 112.3 residues, equiva- 
lent to 15.5% of average UniProt sequence length) 
containing a SLiM is at a higher frequency than 
expected (x 2 P value: 5.13 e -5 ) with 196 SLiMs (30.3% 
of SLiMs in proteins with alternative products or 12.1% 
of total ELM instances) observed in AltSeqs compared 
with the 123 expected. Phosphorylation sites are similarly 
enriched (yC P value: 5.76 e~ 4 ) with 61 more sites found in 
AltSeqs than the 128 expected. There is, however, a po- 
tential bias in this analysis, as IDRs are enriched in alter- 
native exons (8,9) and SLiMs are enriched in IDRs 
(11,12). We, therefore, also assessed whether the afore- 
mentioned enrichment still occurs, when only regions pre- 
dicted as disordered are investigated [IUPred (32) 
scores > 0.4 considered as an IDR]. In this case, SLiMs 
are the sole functional unit significantly enriched (/ 2 
P value: 2.40 e~ 4 ) (138 observed and 83 expected) 
(Figure 2b) suggesting a preference for SLiMs in 
AltSeqs. These results are validated using the independ- 
ently annotated data from the Domino database of 
peptide-mediated interactions (21) consisting of 848 
protein isoforms produced from 274 genes. Peptides, 
likely to contain SLiMs, are highly enriched within 
AltSeqs (x 2 P value: 4.74 e" 5 ) (163 observed and 97 
expected). This enrichment of SLiMs is again observed 
when only functional units within IDRs are investigated 
(/ 2 P value: 5.71 e~ 3 ) (106 observed and 69 expected) 
(Supplementary Figure S3). For further assessment 
of functional site enrichment, additional instances of 
PTMs are extracted from the PhosphoSite Plus database 
(34). However, no enrichment is identified for these other 
PTMs in AltSeqs (Supplementary Table SI). This suggests 



that SLiMs represent a key regulatory element altered 
between protein isoforms. 

The analysis does not show a bias towards a particular 
type of SLiM, for example, targeting motifs, to be in 
an AltSeq (Supplementary Figure S2). The observation 
that SLiMs are enriched within AltSeqs but no particular 
ELM type is significantly enriched raises the question, 
what type of SLiMs are present within these regions? 
We, therefore, assess the individual ELM functional 
classes for enrichment within alternative exons (Table 1). 
We identify a number of classes whose instances occur at a 
much higher frequency than expected in non-constitutive 
exons. The majority of these ELM classes bind to domains 
found within intracellular signal-transduction proteins 
(e.g. SH2 or PTB domains). However, the instances 
annotated in the ELM resource are limited in number, 
as only examples identified by low-throughput experimen- 
tation are included. To validate the observation that motif 
instances associated with domains in signal-transduction 
proteins being enriched in protein encoding alternative 
exons, motif instances identified as binding to PDZ (23), 
PTB (24), SH2 (13,25), SH3 (26) and WW (13,27) domains 
in HTP experiments and by specialist annotation are 
investigated further. As shown in Figure 3, the aforemen- 
tioned enrichment of motifs binding to the SH2 
(X 2 P value: 0.027) and PTB (x 2 P value: 0.033) domains 
is confirmed, as well as identifying that PDZ- (x 2 P value: 
0.025) and WW-(x 2 P value: 0.092) domain-binding motifs 
have an increased likelihood of being removed or included 
between protein isoforms. 

SLiMs tend to reoccur or have one (or multiple) other 
instances of the same ELM class in the same protein. This 
is highlighted by the fact that 34.9% of ELM-annotated 
SLiM instances are recurring (11). When we analyse the 
occurrence of these recurring motifs within alternative 
exons, we find that SLiMs known to reoccur multiple 
times in a protein sequence are significantly enriched 
within AltSeqs (50.7% are recurring: x 2 P value: 0.013) 
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Table 1. Preferential alteration of specific ELM classes 



ELMID 


Regular expression 


Binding domain'' 


No. removed 


No. total 


Percentage removed 


LOD 


LIG SH2 STAT3 


00.Q 


SH2 


7 


8 


87.50 


10.89 


LIG SH2 STAT5 


(Y)[VLTFIC]. 


SH2 


7 


12 


58.33 


7.95 


LIG PTB Apo 2 


fP].NP.(Y) 


PTB 


S 


19 


42.11 


7.91 


MOD TYR ITSM 


T.(Y).[IV] 


SH2 


6 


11 


54.55 


6.55 


LIG PTB Phospho 1 


fP].NP.[FY]. 


PTB 


7 


16 


43.75 


6.51 


LIG SxIP EBH 1 


[ST].[IL]P 


EB1 


5 


9 


55.56 


5.52 


LIG PDZ Class 1 


[ST].[ACVILF]$ 


PDZ 


6 


15 


40.00 


5.11 


LIG EVH1 2 


PP.F 


WH1 


4 


8 


50.00 


4.13 


MOD PKA 1 


[RK][RK].([ST])[-P]. 


Pkinase 


7 


23 


30.43 


3.62 


MOD ProDKin 1 


([ST])P. 


Pkinase 


8 


28 


28.57 


3.32 


TRG ENDOCYTIC 


Y.[LMVIF] 


Adap_comp_sub 


3 


7 


42.86 


2.74 



The number of instances per an ELM class that occur in non-constitutive protein-encoding exons (removed), the total number of instances annotated 
and the log-odds ratio (LOD), Equation 3, for statistical significance as a function of the total counts of the domain (5) are presented. The regular 
expressions are annotated in the ELM resource. Only ELM classes with more than five annotated instances were assessed, and only those with a 
LOD score >2 are shown. $ = C terminal; (X) = residue X must be modified for binding (e.g. by phosphorylation); [*P] = proline residue not 
allowed; . = any amino acid; [XYZ] = either residue X, Y or Z is allowed at this position. 
"'Names based on annotated by the Pfam resource. 




PDZ PTB SH2 SH3 WW 

Binding Motif 

Figure 3. A comparison of the occurrences of five highly studied 
binding motifs within alternative exons. The observed and expected 
occurrences of linear motifs identified in HTP experimental studies of 
proteins with known isoforms (30). The PDZ, PTB and SH2-binding 
sites are all enr iched within AltSeqs (* 2 , P <0.05). The WW 
domain-binding motif is not enriched to a level of statistical significance 
but has 26 instances observed in AltSeqs compared with the 15.2 in- 
stances expected. The expected occurrence of binding motifs was 
calculated using Equation 1 except for the PDZ-binding motif, which 
was calculated based on the occurrence of AltSeqs at the C terminal. 
(*statistical enrichment and numbers = occurrences). 



(107 observed and 73.6 expected). This suggests that 
the inclusion or removal of SLiM-containing exons may 
tune the multivalent cooperativity of an isoform's inter- 
actions (35). 

The removal or inclusion of complete globular domains 
also displays a weak statistical enrichment within AltSeqs 
(x 2 P value: 1.29 e~ 3 ) (204 observed and 144 expected) 
reflecting similar results by Kriventseva et al. (6). 



Conversely, splice variants with partial domains or 
domains that are partially encoded by an AltSeq are 
under-represented (/ 2 P value < 1.59 e~ 3 ) (161 observed 
and 223 expected) (6,7). A similar finding is also 
observed for functional elements truncated by the 
removal or inclusion of AltSeqs (15 observed and 
85 expected) (Figure 2a) (6). 

SLiM prediction can aid understanding of protein 
isoforms 

The diverse and often conflicting functions of the different 
protein products of a gene are frequently designated 
to one protein isoform. The use of bioinformatics to dis- 
criminate these differences, in particular by identifying 
isoform-specific SLiMs, may facilitate an understanding 
of the distinct properties of these splice variants, such as 
half-life, interaction partners and cellular localisation. 

An apt example of this is p53, whose varied and often 
opposing functions have puzzled researchers for many 
years (36). The recent expansion in the number of 
known alternative protein products of this gene has 
given a tantalising opportunity to uncover the source of 
this functional diversity (37). A series of articles focusing 
on the transcriptional regulation of these splice isoforms 
[e.g. (37,38)] has enabled some of this diversity to be 
explored. In Figure 4, the alternative products of p53 
are displayed along with their DNA-binding domains 
and SLiMs. The different phenotypes of p53 isoforms 
can often be attributed to the removal or inclusion of 
SLiM-containing exons. For example, the increased 
half-life of A40p53 (9.5 h compared with 5-20 min of 
full-length p53) (38) has been attributed to the loss of 
the MDM2-binding site (39), which marks full-length 
p53 for degradation by the attachment of ubiquitin (40). 
Similarly, the absence of the nuclear export signal in 
A40p53y explains its exclusive nuclear localisation (in a 
similar manner to p53(3 and A133p53y) (38). Other 
putative explanations of phenotypic observations include 
attributing the shorter half-life of p53(5 to the absence of 
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Figure 4. Bioinformatics can identify functional differences between protein variants. (A) The exon sequence of the TP53 gene. The coloured 
(non-grey) exons are alternative exons that vary between protein isoforms. The yellow exons are absent in A40p53, the mauve exon is specific to 
p53p\ the mint green exon is exclusive to p53y and the orange exons are present only in p53a. (B) The four distinct isoforms of TP53 are shown with 
modular architecture annotated onto the full-length protein isoform (p53oc) using ELM and Pfam, the exception being the KEN-box, which is 
predicted. The protein sequences of the TP53 isoforms are shown as a grey line, SLiMs in red, globular domains in blue and previously shown 
modular structures are opaque. Sequence diversity between the alternative protein products leads to changes in the SLiM content of the p53 
alternative products, for example, p53y loses a cyclin binding site and two USP7 binding sites, a nuclear export signal and a 14-3-3 binding site. 



the LIG_USP7_1 (41) and 14-3-3 (42) binding sites. Novel 
motifs can also arise by addition of an alternative exon, 
such as the putative KEN-box degron in p53y, which 
could indicate a novel method of degradation for p53y 
by the APC/C complex during anaphase. The expression 
and half-life of p53 is, therefore, carefully regulated by 
pre-and post-translational mechanisms that alter the avail- 
ability of this protein's interaction surfaces resulting in 
subtle but important phenotypic differences (37,38). 
Observations based on the interpretation of phenotypic 
data can help direct further experimentation, which may 
further elucidate the often-enigmatic differences between 
protein isoforms. 



DISCUSSION 

The importance of pre-translational variation within the 
cell for facilitating cell signalling and regulation is 
becoming increasingly apparent (43-45). The inclusion 
or removal of non-protein coding regions, for example, 
is known to influence mRNA stability, translational effi- 
ciency and mRNA localisation (46,47). In this article, we 
have investigated how the removal/inclusion of functional 
modules between protein isoforms can lead to functional 
diversity. In particular, we have focused on the inclusion/ 
removal of SLiM-containing alternative exons known 
to create protein isoforms of differing functions (10). 
These differences include the targeting of protein splice 
variants to different sub-cellular locations [e.g. to the per- 
oxisome rather than the mitochondria (48)], changes in 
interaction partners [e.g. PDZ SLiMs within membrane 
receptors (49)] or more dramatic changes such as 
altering a protein's function from pro-apoptotic to 
anti-apoptotic (50). 

In this article, we have confirmed previous observations 
that alternatively spliced exons are enriched for IDRs (8,9) 



as well as demonstrating a similar enrichment for IDRs in 
exons generated by alternative promoter usage (Figure 1). 
This observation prompted us to investigate the propen- 
sity of known functional protein modules to occur in 
regions altered by alternative splicing and/or by variable 
promoter usage. We identified an enrichment of SLiMs 
within AltSeqs indicating that the inclusion or removal 
of motif-containing exons is an important mechanism 
for modifying the functional properties of alternative 
protein products. In particular, exons containing SLiMs 
that bind to SH2 domains are commonly altered by 
pre-translational mechanisms (Table 1 and Figure 3). 
SH2-binding motifs are often present in the cytoplasmic 
tails of membrane receptors, and their inclusion or 
removal is, for example, known to affect the multiva- 
lent assembly of regulatory complexes important for 
signal propagation (51,52). Similarly, the inclusion 
or removal of PDZ motifs, also found enriched in 
AltSeqs, is known to create functional diversity. For 
example, in neurons, splice variants differing in 
their C-terminal PDZ motifs play specific roles in the regu- 
lation of neurotransmission, ion channel function and 
development (49). 

The small footprint of linear motifs confers a number of 
advantages in terms of cell regulation and signalling 
(11,53,54). First, the limited number of residues in a 
SLiM that contribute to binding usually leads to a 
binding affinity for SLiM-mediated interactions in the 
micromolar range. Consequently, motif-mediated inter- 
actions are predominately both transient and reversible 
(14). This reliance on a limited number of amino acids 
means that SLiM-mediated interactions can be 
weakened (or strengthened) by PTMs, whose bulk and 
charge can disrupt (or enhance) this weak binding 
affinity. Similarly, the short length of linear motifs 
means these interaction modules can often occur 
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multiple times in a single protein (11). This can facilitate 
mutually exclusive binding, when two motifs share a 
binding surface (for example, when they overlap) or 
promote high-avidity interactions, when motifs reoccur 
in separate positions along a protein sequence. These 
switching mechanisms act to regulate SLiM-mediated 
interaction. Alternative splicing and other pre- 
translational mechanisms can therefore alter the 
regulation of a protein by including or removing SLiM- 
containing exons. For example, altering the number of 
reoccurring motifs in a protein by exon removal/inclusion 
can change the avidity of SLiM-mediated interactions, 
tuning the sensitivity of signalling pathways in a 
temporal- and tissue-specific manner [e.g. (55)]. In this 
article, we have demonstrated that these reoccurring 
motifs are enriched in AltSeqs, suggesting that the 
inclusion/removal of reoccurring SLiMs is a mechanism 
commonly used by the cell. Similarly, an exon boundary 
intersecting these two overlapping SLiMs can facilitate the 
production of one isoform with an overlapping pair of 
motifs capable of acting as a regulatory switch and 
another isoform with just a single motif [e.g. (56,57)]. 
Linear motifs are, therefore, susceptible to a multipli- 
city of regulatory mechanisms that are important in 
regulating signalling within the cell. These regulatory 
features can be manipulated by the inclusion or removal 
of non-constitutive exons to create important but often 
subtle differences in the regulation and function of a 
protein. 

The high false-positive rate of SLiM prediction (11) 
means the scope of this analysis is limited to the annotated 
SLiM data sets available from the ELM and Domino re- 
sources as well as data from HTP experimental studies. 
Despite this limitation, we have still been able to 
demonstrate a statistical enrichment of SLiMs within 
AltSeqs, suggesting an important role for motifs in the 
functional diversification and regulation of alternative 
protein products. An appreciation of how functional dif- 
ferences can arise between protein isoforms is key to our 
understanding of proteomic diversity. This is important as 
up to one-half of disease-causing mutations affect splicing 
(58) with several examples of the inclusion/exclusion 
of SLiM-containing exons producing disease-specific 
isoforms (59-61). An example of this is Hoyerall- 
Hreidarsson syndrome, a rare genetic disorder 
characterised by premature ageing, in which an aberrant 
splice variant of the Apollo gene is expressed that lacks a 
telomeric repeat-binding factor 2 (TRF2)-binding motif. 
This Apollo splice variant is unable to bind the TRF2 
protein leading to telomeric dysfunction and cellular 
senescence (61). Approaches are being developed to 
target this type of aberrant splicing event by redirecting 
alternative splicing. The principal of this approach is to 
redirect the splicing of a transcript to promote the produc- 
tion of a favourable isoform in preference to the unfavour- 
able splice variant (62). This could have therapeutic 
potential as demonstrated in Duchenne muscular dystro- 
phy (63) and a melanoma model (64). An appreciation of 
the protein interaction modules most commonly altered 
between protein isoforms can help target these problems 
more precisely. 
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